The Evolution of Time Series Models in Machine Learning
Vispi Karkaria • May 2026
Time series modeling has undergone major transformation. The evolution reflects a broader shift in ML: from handcrafted assumptions about temporal behavior to data-driven systems capable of learning temporal representations directly from data.
Today, time series systems are central across finance, healthcare, manufacturing, digital twins, climate science, recommendation systems, and scientific ML. Temporal systems require models to reason about dynamics, memory, uncertainty, causality, and long-horizon interactions.
Era 1: Classical Statistical Methods
The earliest generation was dominated by statistical forecasting techniques including ARIMA, SARIMA, Kalman filters, exponential smoothing, and Gaussian processes [1]. These methods were mathematically grounded, interpretable, and computationally efficient but relied heavily on handcrafted assumptions and struggled with highly nonlinear dynamics or large-scale multivariate systems.
Era 2: Deep Learning
The rise of deep learning significantly changed the field. Key developments:
- RNNs, LSTMs, GRUs, Temporal Convolutional Networks (TCNs), and sequence-to-sequence forecasting models for learning temporal features directly from raw sequential data.
- Attention-based architectures and Transformers (after Vaswani et al. [4]), replacing recurrence with attention mechanisms for modeling global temporal interactions more efficiently.
Important milestones:
- LSTM architectures by Hochreiter and Schmidhuber [2] modeled long-term temporal dependencies more effectively than classical recurrent systems.
- TCNs — Bai et al. [3] demonstrated that temporal convolutional networks often outperformed recurrent architectures on sequence modeling tasks while being easier to parallelize and train.
- Transformers enabled systems to capture long-range interactions, scale to larger datasets, and parallelize training efficiently.
Era 3: Foundation Models (2020s)
The most recent shift is from task-specific forecasting models to generalizable time series foundation models.
Key architectures:
- Informer [5] — efficient attention mechanisms for long-sequence forecasting
- Autoformer [6] — decomposition strategies for temporal modeling
- FEDformer [7] — frequency domain learning and multiscale temporal modeling
Recent systems like TimesFM [8] and Chronos [9] reflect the trend toward universal forecasting models capable of transfer learning across domains.
Another major development: integration of time series learning into scientific ML and digital twin systems, combining temporal modeling, spatial reasoning, operator learning, uncertainty quantification, and physics-informed learning.
Key Trends
- Time series models are evolving from isolated forecasting systems toward foundation-style temporal reasoning architectures capable of transfer learning and generalization across domains.
- Modern systems increasingly combine forecasting, reasoning, uncertainty quantification, and physics-informed learning into unified temporal intelligence frameworks.
The Road Ahead
The most important question is shifting from "Can a model forecast the next step?" to "Can AI systems learn the underlying dynamics governing how complex systems evolve through time?"
References
- Box, G.E.P., Jenkins, G.M., et al. "Time Series Analysis: Forecasting and Control." Wiley, 1970.
- Hochreiter, S. and Schmidhuber, J. "Long short-term memory." Neural Computation 9.8 (1997): 1735-1780.
- Bai, S., Kolter, J.Z., and Koltun, V. "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling." arXiv:1803.01271 (2018).
- Vaswani, A., et al. "Attention is all you need." NeurIPS 30 (2017).
- Zhou, H., et al. "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI 35.12 (2021): 11106-11115.
- Wu, H., et al. "Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting." NeurIPS 34 (2021).
- Zhou, T., et al. "FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting." ICML (2022).
- Das, A., et al. "A decoder-only foundation model for time-series forecasting." ICML (2024).
- Ansari, A.F., et al. "Chronos: Learning the language of time series." arXiv:2403.07815 (2024).